Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map.
نویسندگان
چکیده
The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date.
منابع مشابه
Automated whole-genome multiple alignment of rat, mouse, and human.
We have built a whole-genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline that combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment and consists of two main steps: (1) alignment of the mouse and rat genomes, and (2) alignment of human to either the mouse-...
متن کاملGenetic anchoring of whole-genome shotgun assemblies
The recent advances in sequencing throughput and genome assembly algorithms have established whole-genome shotgun (WGS) assemblies as the cornerstone of the genomic infrastructure for many species. WGS assemblies can be constructed with comparative ease and give a comprehensive representation of the gene space even of large and complex genomes. One major obstacle in utilizing WGS assemblies for...
متن کاملProposal to Sequence the Genome of the Chicken
The chicken genome has a haploid content of 1.2 x 109 base pairs (bp) of DNA; approximately 40% that of either mouse or human. The strategy that will be employed to sequence this genome is to assemble 6-fold whole-genome shotgun coverage of the genome and to order and orient the resulting sequence scaffolds by alignment to endsequences of BACs in a comprehensive contig map. This is in contrast ...
متن کاملComparing vertebrate whole-genome shotgun reads to the human genome.
Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper...
متن کاملAugmenting Chinese hamster genome assembly by identifying regions of high confidence.
Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly er...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 13 1 شماره
صفحات -
تاریخ انتشار 2003